Please type a plus sign (+) inside this box [+] pto/sb/os (12/97) 

Approved for use through 09/30/00. 0MB 0651-0032 
Patent and Trademark Office: U.S. DEPARTMENT OF COMMERCE 

Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of infomnation unless it displays a valid 0MB control number. 



UTILITY PATENT APPLICATION TRANSMITTAL 

(Only for new nonprovisional applications under 37 CFR 1 .53(b) 

Attorney Docket No. 042390, P9766 Total Pages _2_ 

First Named Inventor or Application Identifier Steven M. Bennett 

Express Mail Label No. EL627465225US 



ADDRESS TO: Assistant Commissioner for Patents 

Box Patent Application 
Washington, D. C. 20231 



APPLICATION ELEMENTS 

See MPEP chapter 600 concerning utility patent application contents. 

1 . X Fee Transmittal Form 

(Submit an original, and a duplicate for fee processing) 

2. X Specification (Total Pages 26 ) 

(preferred arrangement set forth below) 

- Descriptive Title of the Invention 

- Cross References to Related Applications 

- statement Regarding Fed sponsored R&D 

- Reference to Microfiche Appendix 

- Background of the invention 

- Brief Summary of the Invention 

- Brief Description of the Drawings (if filed) 

- Detailed Description 

- Claims 

- Abstract of the Disclosure 

3. X Drawings(s) (35 USC 1 1 3) (Total Sheets _4 ) 

4. X Oath or Declaration (Total Pages 

a. Newly Executed (Original or Copy) 

b. Copy from a Prior Application (37 CFR 1 .63(d)) 

(for Continuation/Divisional with Box 17 connpleted) (Note Box 5 below) 

i- DELETIONS OF INVENTOR(S) Signed statennent attached deleting 

inventor{s) named in the prior application, see 37 CFR 1.63(d)(2) 
and 1.33(b). 

5. _ Incorporation By Reference (useable if Box 4b is checked) 

The entire disclosure of the prior application, from which a copy of the oath or 
declaration is supplied under Box 4b, is considered as being part of the 
disclosure of the accompanying application and is hereby incorporated by 
reference therein. 

6. _ Microfiche Computer Program (Appendix) 

7. Nucleotide and/or Amino Acid Sequence Submission 

(if applicable, all necessary) 



12/01/97 -1 



PTO/SB/05 (12/97) 



a. Computer Readable Copy 

Paper Copy (identical to computer copy) 

0. Statement verifying Identity of above copies 

ACCOMPANYING APPLICATION PARTS 

8. _ Assignment Papers (cover sheet & docunnents(s)) 

9. a. 37 CFR 3.73(b) Statement (where there is an assignee) 

_ b. Power of Attorney 

1 0. _ English Translation Document (if applicable) 

11. _ a. Information Disclosure Statement (IDS)/PTO-1449 
_ b. Copies of IDS Citations 

12. _ Preliminary Amendment 

1 3. _X Return Receipt Postcard (MPEP 503) (Should be specifically itemized) 

14. a. Small Entity Statement(s) 

b. Statement filed in prior application, Status still proper and desired 

1 5. _ Certified Copy of Priority Document(s) (if foreign priority Is claimed) 



16. X Other: Copy of postcard with Certificate of Mailino and Express Mail sticker 



17. If a CONTINUING APPLICATION, check appropriate box and supply the requisite information: 

Continuation Divisional Continuation-in-part (CIP) 

of prior application No: 



1 8. Correspondence Address 

Customer Number or Bar Code Label 



(Insert Customer No. or Attach Bar Code Label here) 

or 



X Correspondence Address Below 



NAME Thomas S. Ferrill Reg. No. 42.532 X iKmr^ ^^^6^^ J-^'l-OQ 
BLAKELY. SOKOLOFF. TAYLOR & ZAFMAN LLP 

ADDRESS 12400 Wilshire Boulevard . 

Seventh Floor 

CITY LosAnoeles STATE California ZIP CODE 90025-1026 



Country U.S.A. TELEPHONE (408) 720-8300 FAX (408) 720-9397 



EXPRESS MAIL CERTIFICATE OF MAILING 

"Express Mail" mailing label number - EL627465225I JS 

Date of Deposit: 9/29/2000 



I hereby certify that I am causing this paper or fee to be deposited with the United States Postal Service "Express Mail Post Office to 
Addressee" service on the date indicated above and that this paper or fee has been addressed to the Assistant Commissioner for Patents 
Washington, D. C. 20231 
Michelle Offenbaker 




(Date signed) 
12/01/97 



PTO/SB/05 (12/97) 
Approved for use through 09/30/00. 0MB 0651-0032 
Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE 



PTO/SB/1 7(6/99) 

Approved for use through 09/30/2000. 0MB 0651-0032 
Patent and Trademark Office: U.S. DEPARTMENT OF COMMERCE 

Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it displays a valid 0MB control number. 



Complete if Known: 
Application No. Unassiqned 
Filing Date Herewith 



FEE TRANSMITTAL FOR FY 2000 

TOTAL AMOUNT OF PAYMENT ($) 1008.00 



First Named Inventor Steven M. Bennett 
Group Art Unit Unassiqned 



Examiner Name Unassiqned 

Attorney Docket No. 042390.P9766 



METHOD OF PAYMENT (check one) 

[ ] The Commissioner is liereby authorized to charge indicated fees and credit 
any over payments to: 

Deposit Account Number 02-2666 

Deposit Account Name 



[ X ] Charge Any Additional Fee Required Under 37 CFR 1 .1 6 and 1 .1 7 



Payment Enclosed: 

_X Check 

Money Order 

Other 



FEE CALCULATION 

1. BASIC FILING FEE 



Larqe 


Entitv 


Small Entitv 


Fee 


Fee 


Fee Fee 


Code 


($) 


Code ($) 


101 


690 


201 345 


106 


310 


206 155 


107 


480 


207 240 


108 


690 


208 345 


114 


150 


214 75 



Fee Description Fee Paid 
Utility application filing fee 690.00 

Design application filing fee 

Plant filing fee 

Reissue filing fee 

Provisional application filing fee 

SUBTOTAL (1) $690.00 



EXTRA CLAIM FEES Fee from 

Extra Claims below Fee Paid 

Total Claims 29 -20** = _9 X 18.00 = 162.00 

Independent Claims^ -3**= _2 X 78.00 = 156.00 

Multiple Dependent - 
Or number previously paid, if greater; For Reissues, see below. 



Larae 


Entity 


Small 


Entity 


Fee 


Fee 


Fee 


Fee 


Code 


($) 


Code 


($) 


103 


18 


203 


9 


102 


78 


202 


39 


104 


260 


204 


130 


109 


78 


209 


39 


110 


18 


210 


9 



Fee Description 
Claims in excess of 20 
Independent claims in excess of 3 
Multiple dependent claim, if not paid 
**Reissue independent claims over original patent 
**Reissue claims in excess of 20 and over original patent 

SUBTOTAL (2) $318.00 
FEE CALCULATION fcontinued) 



3. ADDITIONAL FEES 

01/1 0/2000 - 1 - PTO/SB/1 7 (6/99) 

Patent fees are subject to annual revisions. Small Entity payments must be supported by a small entity statement, otherwise large 
entity fees must be paid. 
See Forms PTO/SB/09-12 



Larqe Entitv 


Small Entitv 




Cam 

Fee 


Fee 


Fee 


Fee 




Code 


($) 


Code 


($) 


Fee Description 


105 


130 


205 


65 


Surcharge - late filing fee or oath 


127 


50 


227 


25 


Surcharge - late provisional filing fee 
or cover sheet 


139 


130 


139 


130 


Non-English specification 


147 


2,520 


147 


2,520 


For filing a request for reexamination 


112 


920* 


112 


920* 


Requesting publication of SIR prior to 
Examiner action 


113 


1,840* 


113 


1,840* 


Requesting publication of SIR after 
Examiner action 


ilo 


110 


215 


55 


Extension for response within first month 


Ho 


380 


216 


190 


Extension for response within second month 


117 


870 


217 


435 


Extension for response within third month 


118 


1,360 


218 


680 


Extension for response within fourth month 


128 


1,850 


228 


925 


Extension for response within fifth month 


119 


300 


219 


150 


Notice of Appeal 


120 


300 


220 


150 


Filing a brief in support of an appeal 


121 


260 


221 


130 


Request for oral hearing 


138 


1,510 


138 


1,510 


Petition to institute a public use proceeding 


140 


110 


240 


55 


Petition to revive unavoidably abandoned 
application 


141 


1,210 


241 


605 


Petition to revive unintentionally 


142 








abandoned application 


1,210 


242 


605 


Utility issue fee (or reissue) 


143 


430 


243 


215 


Design issue fee 


144 


580 


244 


290 


Plant issue fee 


122 


130 


122 


130 


Petitions to the Commissioner 


123 


50 


123 


50 


Petitions related to provisional applications 


126 


240 


126 


240 


Submission of Information Disclosure Stmt 


581 


40 


581 


40 


Recording each patent assignment per 


146 








property (times number of properties) 


690 


246 


345 


For filing a submission after final rejection 
(see 37 CFR 1,1 29(a)) 


149 


690 


249 


345 


For each additional invention to be examined 
(see 37 CFR 1, 129(a)) 



Fee Paid 



Other fee (specify) 
Other fee (specify) 



*Reduced by Basic Filing Fee Paid 



SUBTOTAL (3) $ 0.00 



SUBMITTED BY : 
Typed or Printed Name: 



Typed or Printe d Name : Then 
Signature S^^^rT^^ ^ 




rrill 



Date 



Reg. Number 42,532 



Deposit Account User ID 02-2666 



(complete If applicable) 



EXPRESS MAIL CERTIFICATE OF MAILING 

"Express Mail" mailing label numbe r- EL627465225US 

Date of Deposit: 9/29/2000 



I hereby certify that I am causing this paper or fee to be deposited with the United States Postal Service "Express Mail Post Office to 
Addressee" service on the date indicated above and that this paper or fee has been addressed to the Assistant Commissioner for Patents 
Washington, D- C. 20231 

Michelle Offenbaker 

(Typed of printec^ 

(Signature of person 




(Date signed) 




01/10/2000 



PTO/SB/17(6/99) 



EXPRESS MAIL CERTIFICATE OF MAILING 



'^Express Mail" mailing label number : EL627465225US 
Date of Deposit: 9/29/2000 



I hereby certify that I am causing this paper or fee to be deposited with the United States Postal Service 
"Express Mail Post Office to Addressee" service on the date indicated above and that this paper or fee has 
been addressed to the Assistant Commissioner for Patents, Washington, D. C. 2023 1 
Michelle Offenbaker 



(Typed or prirjted nameof person mailing paper or fee) 
l^ghSture of b^^^ec tpaperdr fee) 



(Date signed) 




Serial/Patent No.: 
Client: Tr,<-p1 
Title: 



***** 



TtH-pI (Inrpn-rgf-fnTI 

CHANGING TCHARACTERISTICS OF A VOICE USER INTERFACE 



Filing/Issue Date: Sept. 29. 2000 



BSTZFileNo.: 042390 . PgTgT 
Date Mailed: Q/9Q/9nnn 



The following has been received in the U.S, 

Amendment/Response ( pgs.) 

Appeal Bnef ( pgs.) (in triplicate) 



□ 



□ 
□ 
□ 
□ 
□ 
□ 
□ 
□ 
■ 
□ 
□ 



Patent 
□ 
□ 

Application - Utility f25 P8*- cover and abstract) □ 

_ pgs.) □ 
pgs.) □ 
□ 

pgs) □ 



Atty/Sectv Initials: TSF/mro 
Docket Due Date: ***** 



& Trademark Office on the date stamped hereon; 



Application - Rule U3{b) Continuation 
Application - Rule 1.53(b) Divisional L 

Application - Rule I.53Cb) CIP ( pgs.) 

Application - Rule 1.53(d) CPA Transmittal C 

Application - Design ( pgs.) 

Application - PCX ( pgs.) 

Applicauon - ProvisitMial ( pgs.) 

Assignment and Cover Sheet 
CertiAcate of Mailing 

DeclaraDon & POA ( pgs.) 

nsdcHRDacs& Qg ACcpyrflrertofeSgncdLatrC 
Drawings: 4 # of sheets includes 



□ 
□ 
□ 
□ 
□ 



figures 



Express Mail No.: EL627465225US 

Month(s) Extension of Tune 

ttnnalonCteksueStafcniEnt&PTD- W« (_ pgs.) 
Issue Fee Transmittal 
Nouce of Appeal 
Petition for Extension of Time 

Petition for 

Postcard 

Power of Attorney ( pgs.) 

Preliminary Amendment ( pgs.) 

Reply Brief ( pgs.) 

Response to Notice of Missmg Parts 

Small Entity Declarauon for Indep. Inventor/Small Business 

Transmittal Letter, in duplicate 

Fee Transnuttal. in duplicate 



Check No.3fiQ2D 
Amt:$10Qa^0 

Check No. 

Amt: 



Other: Copv of postcard with Certificate of Mailing and 

F.Yprpg<; Mail cff/^VpT 



PATENT 
[042390.P9766] 



APPLICATION FOR A UNITED STATES PATENT 

for 

CHANGING CHARACTERISTICS OF A VOICE USER INTERFACE 

by 

STEVEN M. BENNETT 



EXPRESS MAIL MAILING LABEL 

NUMBER EL627465225US 



DATE OF DEPOSIT 9/29/2000 



I hereby certify that this paper or fee is being deposited with the United States Postal Service 
"EXPRESS MAIL POST OFFICE TO ADDRESSEE" service under 37 C.F.R. 1.10 on the date 
indicated above and is addressed tOL Assistant Commissioner for Patents, Washington D.C. 20231 . 




042390.P9766 



1 



HELD OF THE INVENTION 

This invention generally relates to a voice processing systems. More specifically, the 
invention relates to using either or both user-specific contextual information and 
environmental information to make changes in a voice user interface. 

BACKGROUND OF THE INVENTION 

A voice processing system comprehends human language thereby allowing a user to 
give commands or make requests to the system by speaking in a human language and having 
the system respond by voice. 

An airline's departure and arrival voice processing system is an example of a 
rudimentary voice processing system. Figure 1 illustrates an exemplary static call flow in a 
voice user interface. Referring to figure 1, the user interface illustrated is typical of the kind of 
static user interface that a user might encounter when using a voice processing system built 
using previous technology. The user interface welcomes the user and then presents two 
options to the user. A first voice prompt 102 asks the user to state whether the user*s flight is 
arriving or departing. The user verbally responds by stating whether the flight is arriving or 
departing. After receiving the user's response, a voice prompt 104 asks the user to state the 
flight number of interest. The user states the flight number. The system and the user repeat 
this process to obtain the flight's departure/arrival date 106 and the flight's arrival/departure 
city 108. Next, the voice processing system repeats the information back to the user to ensure 
that the system comprehends the user's request. The system then retrieves that particular 
flight's information from a database 110, Finally, the system communicates the retrieved 
flight information to the user. 
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The example voice processing system has a static user interface structure. The system 
delivers information to the user based on the user's requests or conmiands, not based on the 
system possessing knowledge regarding the user. Li this example, the voice processing system 
must complete the full sequence of voice prompts before retrieving the desired information. 
Thus, the user must take the time to navigate through those successive voice prompts. 

This system does not deliver content to the user based on the system having any 
knowledge about the user. The airline system possesses knowledge about the user's upcoming 
flight plans, for example, through the user's reservation number or frequent flyer account 
number. However, the system forces the user to step through the static call flow for each 
segment of die trip. Thus, if the user has a connecting flight, then the user must give the 
system the user's information and step through the static call flow again. Additionally, 
although the airline system possesses knowledge of the user's flight plans, the system does not 
proactively notify the user of a schedule conflict, such as a flight delay on the second portion 
of the user's trip, which affects the rest of the user's flight plans. 

Users of voice processing systems are mobile. The mobile user may access the voice 
processing system from many locations such as a moving vehicle, a quiet office, a noisy 
airport, etc.. However, current voice processing systems do not alter their privacy and security 
requirements or operational characteristics based on environmental characteristics. This is a 
problem for the mobile user who accesses the system fi-om a variety of devices and in a variety 
of circumstances. In these situations, the behavior of the system should change to be more 
useful, understandable, private and secure. 

Some voice processing systems allow limited customization of the structure of the 
voice user interface and the content that is delivered to the user. However, after the user 
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interface and content is customized, then the user interface and content that the user interface 
delivers remains static. Users may be forced to skim through excessive amounts of non- 
pertinent information before hearing the information most important to them. When the 
system forces the user to skim through non-pertinent information, then two problems arise: 1) 
the user remains connected to the system longer, thereby, tying up more system resources; and 
2) the user becomes frustrated with the system. 

The invention provides a solution to some of these disadvantages that exist with these 
current systems. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



The drawings refer to the invention in which: 

FIG. 1 illustrates an exemplary static call-flow in a voice user interface; 
FIG. 2 illustrates an embodiment of a system which dynamically changes the voice user 
interface of the system and content communicated to the user based upon either or both user- 
specific contextual information and the environmental information; 
FIG. 3 illustrates a flow chart of how the speech module generates a grammar file for an 
information item in the top database table; and 

FIG. 4 illustrates a flow chart of a dynamically generated call-flow in the voice user interface. 

While the invention is subject to various modifications and alternative forms, specific 
embodiments thereof have been shown by way of example in the drawings and will herein be 
described in detail. The invention should be understood to not be limited to the particular 
forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, 
and alternatives falling within the spirit and scope of the invention as defined by the appended 
claims. 
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DETAILED DISCUSSION 

A voice processing system comprehends human language thereby allowing a user to 
give commands or make requests to the system by speaking in a human language and having 
the system respond. The discussion below centers primarily around voice processing systems 
which are telephony-based; that is, the user interacts with the system over a telephone 
connection using his voice. We note, however, that the invention described here is not limited 
to telephony systems, but in fact includes all voice processing systems regardless of the type 
of communication device 202 or transmitive network 204 involved. 

The voice user interface is a means by which a user and the system interact, typically, 
using speech or other audio tones as the communication method. In the telephony 
environment, this is sometimes referred to as a call-flow. 

Content is information that is potentially of interest to a user. Content may be 
conmiunicated to a user either because the user requests the information or because the system 
intelligently chooses to present the information to the user. For example, if the system is 
aware of the user's itinerary and that the user's airplane flight has just been canceled, then the 
system may choose to present to the user the content that the flight is canceled and the flight 
times associated with altemative flights. On the contrary, the system would not chose to 
deliver the content of altemative flight times to other users who are not scheduled to be on the 
canceled flight. 

User-specific contextual information is information that the system knows about a 
particular user such as the user's identity, current location, current task, calendar, schedule, or 
other similar information. 
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A communication device 202 is a device such as a cell phone, a land-line phone, a 
speakerphone, a wireless headset, or other similar device capable of transmitting a human's 
voice. 

The term audio scene refers to the ambient sound environment at the location of the 
user. Example audio scenes are a moving vehicle, a quiet office cubicle, or an airport with a 
noisy background filled with various human voices and non-speech sounds. 

Environmental information is information such as details of the user's chosen 
communication device 202, details of the communication channel, or audio scene information. 

When interacting with a voice-automated system, the user may interrupt the system 
when the system is speaking to the user. This is referred to as a barge-in. When the user 
barges-in on the system, the system cuts off the system's output mid-stream. Typically, a user 
initiated barge-in expedites the user's capability to get to pertinent information in a more 
timely manner. However, extraneous background noise may cause a false barge-in when the 
noise level becomes high enough. Sources of this noise include public address 
announcements, a car horn blowing, a user's cough or rough handling of the phone. A false 
barge-in may cut off the pertinent information that the user wants to hear. The false barge-in 
forces the user to request that the system repeat the information. A false barge-in lengthens the 
call and increases the frustration level of the user. Additionally, the user may become 
confused. All these factors from a false barge-in directly impact the cost of providing the 
service. 

Heterogeneous information is data that is not all tiie same type. In an embodiment of 
the invention, the heterogeneous information sources include the user's e-mail, voice mail, 
calendar, schedule, flight information, weather information, traffic information, hotel 
042390.P9766 7 



information, rental car information, sports, stocks, news, personal information manager (PIM) 
information (contacts, tasks), as well as particular categories of interest selected by the user. 

Referring to figure 2, figure 2 illustrates an embodiment of a system that dynamically 
changes the voice user interface of the system and content conraiunicated to the user based 
upon either or both user-specific contextual information and environmental information. A 
user interacts with the system through a communication device 202. The user's analog or 
digital voice signal travels to the system across the corresponding transmitive network 204, 
such as a Public Switched Telephone Network, a satellite network, or other similar network. 
The transmitive network 204 may carry analog or digital signals. The system receives the 
user's voice signal at a device such as a telephony interface device 206. If necessary, the 
telephony interface device 206 converts the user's analog voice signal into a stream of 
digitized voice data. This digital voice data is sent to the speaker verification module 208 and 
the speech recognizer 210. The telephony interface device 206 acts as a call control center by 
detecting that an incoming phone call has been received. The telephony interface device 206 
then communicates that an incoming phone call is occurring to the speech module 212. The 
telephony interface device 206 takes the incoming line off-hook, i.e. the telephony interface 
device 206 answers the phone. The telephony interface device 206 accepts digital audio 
signals from either the text to speech engine 222 or the pre-recorded voice file 220. The 
telephony interface device 206 converts the digital signal to analog, if necessary. The digital 
data to be transmitted may be in a variety of forms, such as wave, MPS, raw audio files or 
some other digital form. The speech module 212 may direct a pre-recorded voice file 220 to 
the telephony interface device 206, which in turn transmits the pre-recorded voice file 220 
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onto the transmitive network 204. In this instance, the pre-recorded voice file 220 might 
answer the phone by saying "Welcome to the System, what can I do for you? " 

Each different type of communication device 202 possesses unique audio 
characteristics, i.e. channel characteristics, that differ significantly. The telephony interface 
device 206 may characterize channel characteristics of each communication device 202 and 
communicate the channel characteristics information to the speech module 212. The speech 
module 212 compares these characteristics to the channel characteristics of classes of devices 
stored in the database 214. With this method, the speech module 212 estimates the type of 
communication device 202 that the user is using to communicate with the system. For 
example, the speech module 212 may estimate that the user is calling from a speakerphone, or 
that the user is calling from a cell phone. Li an altemate embodiment, the user may verbally 
tell the system the type of conmiunication device 202 that the user is using to communicate 
with the system. Additionally, the speech module 212 may use the phone number assigned to 
the communication device 202 or caller id information of the communication device 202 to 
cross reference information stored in the database 214 to aid in determining details of the 
communication device 202. If this method can be used because the caller id information is 
available and the database has information on the device associated with this number, then 
this method has been found to be highly accurate. 

The telephony interface device 206 or speech recognizer 210 may also estimate the 
audio scene characteristics associated with user's current location. In an alternative 
embodiment, the speech recognizer 210 estimates the audio scene characteristics associated 
with user's current location. The telephony interface device 206 or speech recognizer 210 
sends the audio scene information to the speech module 212. The speech module 212 
042390.P9766 9 



compares these characteristics to the channel characteristics of classes of audio scenes stored 
in the database 214. In alternative embodiments, the user may tell the system the type of audio 
scene environment that the user is located within. Additionally, the speech module 212 may 
use the phone number of the communication device 202 or caller id information to cross 
reference information stored in the database 214 to aid in determining the audio scene 
information. If this method can be used because either the caller id information is available 
and the database has information on the associated device, or the location of the device is 
fixed and the database has information on the associated location, then this method has been 
found to be highly accurate. 

The telephony interface device 206 detects the sound level of the user's voice at the 
board's input. If the telephony interface device 206 detects a sound above the barge-in level, 
then the board stops generating sound at the board's output. Outbound sound degrades the 
quality of the incoming sound due to echo paths in the transmission lines. By cutting off the 
output, the speech recognizer 210 can do a better job of recognizing the sounds that the 
system is receiving at the input. 

The speech module 212 may set the barge-in threshold through an Application 
Program Interface (API) in the telephony interface device 206 or an API in the speech 
recognizer 210. The speech module 212 may use the acquired environmental information as 
well as user-specific contextual information to determine the appropriate barge-in level. By 
appropriately setting the barge-in level, the system reduces false barge-in occurrences. 

When signaled by the layer of intelligence 218, the speech module 212 references the 
database 214 and sends a notification to the user by directly phoning the user. The layer of 
intelligence 218 sends this notification command if the layer of intelligence 218 recognizes 
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that a high priority item from the top database table 216 requires the user's immediate 
attention. In one embodiment, the layer of intelligence 218 starts with the least intrusive 
method and upon not receiving a user response in a specified period of time, then the layer of 
intelligence 218 escalates the intrasiveness of the notification method. Example notification 
methods include, but are not limited to, sending the user an e-mail, sending SMS messages to 
the user's cell phone, sending pages to the user, and placing a voice call to the user on his cell 
phone, office phone, home phone, etc. 

The speech recognizer 210 receives the stream of digitized voice data from the 
telephony interface device 206. The speech recognizer 210 conducts digital signal processing 
on the incoming user's voice signal for comparison to a language module in order to send 
American Standard Code for Information Interchange (ASCII) text (or some other text format) 
to the speech module 212. The speech recognizer 210 can access multiple language modules 
such as an American English module or a Japanese language group. Part of the language 
module is a granmiar file supplied by the speech module 212. The speech recognizer 210 
compares groups of successive phonemes to an intemal database of known words and the 
expected responses in the grammar file. The speech recognizer 210 sends text corresponding 
to the particular response in the dynamically generated granmiar file to the speech module 
212. A portion of the speech recognizer 210 contains adaptive filters that attempt to model 
and then nullify the communication channel and audio scene noise that is present in the 
digitized speech signal. 

The speech module 212 generates the grammar file sent to the speech recognizer 210. 
This granamar file contains anticipated responses based on the prompted options made 
available to the user and/or statistically frequent responses. The user-specific contextual 
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information is used in determining the form of this grammar file. Some interactions, such as 
delivery of information like news, weather, and e-mail, require only static grammar files 
because user responses/requests are known a priori. For example, the user might say "Read 
the item," "Delete the item," or "Skip the item." However, in more complex interactions, such 
as dealing with a flight cancellation, the range of possible user responses are situation 
dependent requiring the speech module 212 to create a customized grammar file 226. 

The speaker verification module 208 receives the stream of digitized voice data from 
the telephony interface device 206. The speaker verification module 208 performs a biometric 
analysis of the user's voice to authenticate and verify the identity of the user. In response to a 
prompt, the user states his or her identity. The speech recognizer 210 communicates the user's 
stated identity to the speech module 212. The database 214 provides the speaker verification 
module 208 with the necessary voice print to verify that the user is whom the user claims to 
be. The speaker verification module 208 performs this verification by comparing the 
characteristics of the user's voice coming from the telephony interface device 206 to this 
voice print. After analyzing the comparison, the speaker verification module 208 determines a 
confidence level in the authenticity of the identity of the user. If this confidence level is above 
a certain threshold, which is set by the speech module 212, then the identity of the user is 
confirmed. After the speaker verification module 208 confirms the identity of the user, the 
speaker verification module 208 communicates to the speech module 212 that the user's 
identity has been properly verified. 

As another aspect of the security characteristics of the voice user interface, information 
items in the database 214 are marked with a privacy level and a security level. The speech 
module 212 determines a security and privacy rating for a communication to a user based 
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upon the user's environmental information. For example, if access to a communication device 
202 is limited either physically or through a local authentication mechanism such as a 
Personal Identification Number to access a cell phone, then the communication device 202 
will be assigned a high level of security. Conmiunications from the user's office phone are 
Ukewise assigned a higher level of security than a public pay phone, for example. If the user is 
using a communication device 202 with a low level of security, then the speech module 212 
changes the voice user interface by adding extra authentication steps. For example, the user is 
calling from a public pay phone, then the voice user interface may add an extra authentication 
step such as, "Please state your mother's maiden name." The user is expected to say his 
mother's maiden name. The user's response will be verified against data in the database 214 
and possibly by the speaker verification module 208. In an embodiment, the speech module 
212 assumes a high level of security only requiring a user to state the user's name and satisfy a 
voice print analysis. By default, the speech module 212 eliminates extra steps in the voice user 
interface and shortens call times whenever possible. 

The speech module 212 may change the speaker verification confidence threshold 
based on channel characteristics. If the communication channel is noisy or in some other way 
impairs the performance of the speaker verification module 208, then the speech module 212 
may lower the threshold level and add extra authentication steps to the voice user interface as 
described above. For example, an analog cell phone connection often possesses a noisy 
conmiunication channel. If the user is communicating to the system through a communication 
channel or communication device 202 that has a low privacy rating (for example, a speaker 
phone or an analog cell phone connection, both of which are subject to eavesdropping), then 
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the speech module 212 may ask the user if sensitive information assigned a high privacy 
rating should be delivered at this time. 

The speech module 212 receives text representing the user's voice conmiunication to 
determine what grammar file and system prompts should be dynamically generated. 
Additionally, the speech module 212 analyzes the content of text from the speech recognizer 
210 in order to send a request to the database 214 to retrieve the information that the user is 
seeking. When the speech module 212 receives the desired information, then the speech 
module 212 conmiunicates the information to the user by sending to the telephone interface 
device 206 either a pre-recorded voice file 220, or a dynamically generated computer voice 
message created by the text to speech engine 222 or some combination thereof. 

Data from various heterogeneous information sources is placed in the database 214. 
The layer of intelligence 218 assigns a priority level to each piece of information based upon 
the user-specific contextual information. The layer of intelligence 218 orders items of interest 
to a particular user from the database 214 into the top database table 216 based on the priority 
level determined above. The layer of intelligence 218 dynamically organizes the order in 
which the information items from the database 214 are presented to the user by placing the 
information items in priority order in the top database table 216. 

For example, a meeting at 2:00 p.m. at the client's headquarters exists on the user's 
PIM calendar. The driving directions from the user's last known location, the user's office, to 
the client's headquarters suggest driving on highway 101. The monitored traffic news reports 
an accident on highway 101 increasing the travel time by 20 minutes. The system may then 
raise the priority level of the traffic delay information and the potential schedule conflict 
information so that the system conmiunicates this information to the user immediately after 
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the user identifies himself. Additionally, if based upon the user-specific contextual 
information, for example the user is scheduled to be in a meeting until 1:30 p.m., this newly 
acquired information causes a conflict, then the layer of intelligence 218 may increase the 
item's priority level and signal the speech module 212 to send a notification to the user by 
using the telephony interface device 206. 

The layer of intelligence 218 assigns a sensitivity and security level the items in the 
database 214. Information items that are confidential or personal in nature, for example, a 
message from a spouse or an email marked "confidential", may be marked at higher sensitivity 
levels, meaning that the user may not want others to hear them. The security level is used so 
that content providers may stipulate the delivery mechanisms that are acceptable to them. For 
example, a corporate email system may let the user set an outgoing email to a high security 
level. The high security level indicates to the voice processing system that the item should not 
be delivered over less-secure delivery channels. 

Figure 3 illustrates a flow chart of how the speech module 212 generates a grantimar 
file for an information item in the top database table 216. In step 302, the speech module 212 
calls for the first item of information in the top database table 216. In step 304, the speech 
module 212 determines the type of the information item such as a news article, schedule 
reminder, or flight cancellation. The speech module 212 also examines the priority level 
assigned to that item and any sensitivity level assigned to that item. If the communication 
device 202 is a low privacy device such as a speakerphone and the sensitivity of the item is 
high enough, then the speech module 212 may add a prompt to the voice user interface asking 
the user if it is okay to send the sensitive information at this time. In step 306, the speech 
module 212 retrieves the static grammar file 224 for that item type. Jn step 308, if the speech 
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module determines grammar customization is required, then the speech module 212 performs 
the customization creating a customized grammar file 226. In step 310, the speech module 
212 delivers the (possibly customized) grammar file to the speech recognizer 210. The speech 
recognizer 210 uses the grammar file to increase the system's overall speech recognition and 

5 comprehension of the user's actual request/response. 

The speech module 212 dynamically determines the call flow of the voice user 
interface. This dynamic determination is based on factors such as the priority level of the data, 
the user's location and communication device, the sensitivity level of the data, the current task 
the user is engaged in, and other factors particular to the user that the system monitors. The 

10 speech module 212 may change the voice user interface from a passive posture of simply 

responding to the user's requests to an active posture of notifying the user of information from 
the top database table 216 assigned a high enough priority. 

Figure 4 illustrates a flow chart of a dynamically generated call-flow in the voice user 
interface. In step 400, Carl, the user, connects to the system through his office phone. In step 

15 402, a prompt welcomes the user. After the prompt, Carl identifies himself In step 404, the 
speaker verification module 208 and speech module 212 authenticate Carl's identity. 
Additionally, the system determines the user's environmental information. In step 406, the 
speech module 212 proactively presents to Carl items from the top database table 216 
assigned a high enough priority that require Carl's urgent attention. In step 408, if no such 

20 high priority items exist, then the voice user interface passively prompts Carl, "What can I do 
for you?" 

Thus, based on user-specific contextual information, environmental information, the 
sensitivity of the information being communicated to the user, and/or the priority level 
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assigned to the information being communicated to the user, the speech module 212 changes 
the structure of the voice user interface by: eliminating authentication steps; eliminating non- 
relevant call-flow steps for items in the top database table 216; changing the voice user 
interface from passively responding to a user's request to proactively alerting the user to 
information assigned a high priority level; and changing the order in which information is 
delivered, even across information types (for example, interleaving email, calendar and stock 
information without forcing the user to navigate a series of menus to reach these 
heterogeneous pieces of information. 

A further example illustrates the speech module 212 changing the voice user interface to 
eliminate non-relevant steps based upon the system's knowledge of the user-specific 
contextual information follows. In this example, Carl connects with a cellular phone to the 
system and requests the system to give him directions from the airport to his hotel: 
System: Welcome to the System. What can I do for you? 
Carl: It is Carl Weathersby. 

System: Hi Carl. To verify your identity, please say "Mice like green cheese". 
Carl: Mice like green cheese. 

System: Thank you. No urgent items need attention. What can I do for you? 
Carl: I need directions to the hotel. 

System: One moment... Before we begin, you will need $1.25 for a toll during the 

trip. You may want to have it handy. The directions to the Montgomery Hotel 
are as follows. Follow the signs out of the airport and tell me when you are 
nearing the exit out of the airport if I don't speak first. 
A moment later 
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Carl: I am there. 

System: Make a left onto highway la, heading west toward Boston. Stay on la for 12 

nailes. Tell me when you pass exit 1 1 if I don't speak first. 
15 minutes later, . . 

5 System: Carl, you are nearing exit 11. You are about to enter the Summer Tunnel. The 

toll is $1.25. We will lose this cellular connection while you are in the 
tunnel. Please call back on the other side. 

Carl: Goodbye. 

After the tunnel 

10 System: Welcome to the System. What can I do for you? 

Carl: It is Carl Weathersby. 

System: Hi Carl. Please take exit 11, Commercial Street., , Head north on 

Conmiercial Street for 2 miles.. . 
When Carl calls back after exiting the tunnel, voice user interface did not force Carl to 
15 go through the same authentication steps and the system jumped directly back into the task 
which was interrupted by the tunnel. The speech module 212 dynamically generated the 
structure and content of the voice interface based on the user context (in this case, knowledge 
of Carl's location, current task and prior activity). Additionally, the system was either told or 
anticipated Carl's context and appropriately increases the priority level of several information 
20 items. The increase in the priority level of these information items, such as the toll 

information, cellular connection information, and anticipatory traffic interchange information, 
causes the voice user interface to communicate these items without Carl requesting them. 
Furthermore, the speech module 212 accesses the database 214 to cross reference Carl's 
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generic request for directions to the hotel to Carl's itinerary stored in the database 214. The 
speech module 212 accesses the database 214 for information known about the user in order 
to give Carl proper directions to the Montgomery hotel rather than forcing the voice user 
interface to create a prompts asking Carl, "Directions to what hotel?" and "Where are you 
now?". 

In an embodiment, a computer program directs and controls the operation of the voice 
user interface. This program can be embodied onto a machine-readable medium. A machine- 
readable medium includes any mechanism that provides (i.e., stores and/or transmits) 
information in a form readable by a machine (e.g., a computer). For example, a machine- 
readable medium includes read only memory (ROM); random access memory (RAM); 
magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, 
acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital 
signals, etc.); etc. 

Most functions performed by electronic hardware components may be duplicated by 
software emulation. Similarly, processing capability of a central processing unit (CPU) or 
digital signal processor (DSP) on any board or device may be transported to a CPU or DSP 
located on any board or device. For example, in an alternative embodiment the processing of 
information that occurs in the layer of intelligence 218 could be transported to the speech 
module 212. Additionally, the telephony interface 206, speech recognizer 210 or another 
component may determine the type of communication device 202 without involving the 
speech module 212 or the database 214. Furthermore in an alternative embodiment, the 
speech recognizer 210 detects and communicates the audio scene and channel characteristics 
signal to the speech module 212. Therefore, a person skilled in the art will appreciate that 
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various deviations from the described embodiments of the invention are possible and that 
many modifications and improvements may be made within the scope and spirit thereof. The 
invention is to be understood as not limited by the specific embodiments described herein, but 
only by scope of the appended claims. 
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CLAIMS 



I claim: 

1. A system comprising: 

a voice user interface possessing both operational characteristics and security 
characteristics; 

a database to store user-specific contextual information; and 

a computer program to use the user-specific contextual information to dynamically change 
the operational characteristics of the voice user interface. 

2. The system of claim 1 wherein the changed operational characteristic of the voice user 
interface is a setting of a barge-in level. 

3. The system of claim 1 wherein the changed operational characteristic of the voice user 
interface is a generation of a grammar file. 

4. The system of claim 1 wherein the changed operational characteristic of the voice user 
interface is a reduction of non-speech audio components in the processing of a communication 
from the user. 

5. The system of claim 1 further comprising: 

the computer program to use the user-specific contextual information to dynamically 
change the security characteristics of the voice user interface. 

6. The system of claim 5 wherein the security characteristic of the voice user interface is a 
biometric analysis to authenticate an identity of the user. 

7. The system of claim 1 wherein the user-specific contextual information comprises: 
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an identity of the user; and 
a current location of the user. 

8. The system of claim 1 wherein the user-specific contextual information comprises: 

an identity of the user; and 
a current task of the user. 

9. The system of claim 1 further comprising: 

the computer program to change the security characteristics of the voice user interface 
based upon the sensitivity of the information being conmiunicated to the user. 

10. The system of claim 1 further comprising: 

the computer program to use environmental information to dynamically change the 
security characteristics of the voice user interface. 

11. The system of claim 10 vv^herein the security characteristic of the voice user interface is an 
addition of an authentication step to authenticate an identity of the user, 

12. The system of claim 10 wherein the environmental information is communicated to the 
system by the user. 

13. The system of claim 10 wherein the environmental information comprises audio scene 
information at the location of the user. 

14. The system of claim 13 wherein the environmental information is determined by the 
system by comparing the audio scene characteristics at the user's location to known references 
and selecting the matching environmental scene. 

15. A method comprising: 
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changing a barge-in level of a voice processing system by using user-specific 
contextual information; and 

changing security requirements of a voice processing system that the system requires 
from the user based upon user-specific contextual information. 

16. The method of claim 15 further comprising: 

changing the barge-in level of the system by using environmental information. 

17. The method of claim 15 further comprising: 

using a dynamically generated grammar file to enhance the ability of the system to 
recognize communications from the user. 

18. The method of claim 15 further comprising: 

using biometric analysis to authenticate an identity the user. 

19. The method of claim 15 further comprising: 

changing security requirements of the system that the system requires from the user 
based upon environmental information. 

20. The method of claim 15 further comprising: 

changing security requirements of the system that the system requires from the user 
based upon the sensitivity of the information being delivered to the user. 

21. An apparatus comprising: 

a means for changing the barge-in level of a voice processing system by using user- 
specific contextual information; and 

a means for changing security requirements of a voice processing system that the 
system requires from the user based upon user-specific contextual information. 



22. The apparatus of claim 21 further comprising: 
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a means for changing the barge-in level of the system by using environmental 
information. 

23. The apparatus of claim 21 further comprising: 

a means for generating a grammar file to enhance the ability of the system to 

recognize communications from the user. 

24. A machine-readable medium that provides instructions, which when 
executed by a machine, cause the machine to perform operations comprising: 

storing user-specific contextual information; and 

changing security requirements of a voice processing system that the system requires 
from the user based upon user-specific contextual information. 

25. The machine-readable medium of claim 24, which causes the machine to perform the 
further operations comprising: 

changing security requirements of the system that the system requires from the user 
based upon environmental information. 

26. The machine-readable medium of claim 24, which causes the machine to perform the 
further operations comprising: 

changing security requirements of the system that the system requires from the user 
based upon the sensitivity of the information being delivered to the user. 

27. A machine-readable medium that provides instructions, which when 
executed by a machine, cause the machine to perform operations comprising: 

changing the barge-in level of a voice processing system by using user-specific 

contextual information; and 
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changing security requirements of the voice processing system that the system requires 
from the user based upon user-specific contextual information. 

28. The machine-readable medium of claim 27, which causes the machine to perform the 
further operations comprising: 

changing the barge-in level of the system by using environmental information. 

29. The machine-readable medium of claim 27, which causes the machine to perform the 
further operations comprising: 

generating a granmiar file to enhance the ability of the system to recognize 

communications from the user. 

«i« vi« 
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ABSTRACT 



The invention generally relates to a method, apparatus, and system capable of 
changing a voice user interface possessing both operational characteristics and security 
characteristics based upon user-specific contextual information. The voice processing system 
consists of at least the following components: a voice user interface possessing both 
operational characteristics and security characteristics; a database to store user-specific 
contextual information; and a computer program to use the user-specific contextual 
information to dynamically change the operational characteristics of the voice user interface.. 
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associated with the filing and prosecution of a patent application has a duty of candor and good faith in 
dealing with the Office, which includes a duty to disclose to the Office all information known to that individual 
to be material to patentability as defined in this section. The duty to disclosure information exists with respect 
to each pending claim until the claim is cancelled or withdrawn from consideration, or the application becomes 
abandoned. Information material to the patentability of a claim that is cancelled or withdrawn from 
consideration need not be submitted if the information is not material to the patentability of any claim 
remaining under consideration in the application. There Is no duty to submit information which is not matenal 
to the patentability of any existing claim. The duty to disclosure all information known to be material to 
patentability is deemed to be satisfied if all information known to be material to patentability of ^ny claim 
issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by §§1 -^Ji^H^) 
and 1 98 However, no patent will be granted on an application in connection with which fraud on the Office 
was practiced or attempted or the duty of disclosure was violated through bad faith or intentional misconduct. 
The Office encourages applicants to carefully examine: 

(1) Prior art cited in search reports of a foreign patent office in a counterpart application, and 

(2) The closest information over which individuals associated with the filing or prosecution of a 
patent application believe any pending claim patentably defines, to make sure that any material information 
contained therein is disclosed to the Office. 

(b) Under this section, information is material to patentability when it is not cumulative to 
information already of record or being made or record in the application, and 

(1) It establishes, by itself or In combination with other information, a prima facie case of 
unpatentability of a claim; or 

(2) It refutes, or is inconsistent with, a position the applicant takes in: 

(i) Opposing an argument of unpatentability relied on by the Office, or 

(ii) Asserting an argument of patentability. 

A prima facie case of unpatentability is established when the information compels a conclusion that a claim is 
unpatentable under the preponderance of evidence, burden-of-proof standard, giving each term in the claim 
its broadest reasonable construction consistent with the specification, and before any consideration is given to 
evidence which may be submitted in an attempt to establish a contrary conclusion of patentability. 

(c) Individuals associated with the filing or prosecution of a patent application within the 
meaning of this section are: 

(1 ) Each inventor named in the application; 

(2) Each attorney or agent who prepares or prosecutes the application; and 

(3) Every other person who is substantively involved in the preparation or prosecution of the 
application and who is associated with the inventor, with the assignee or with anyone to whom there is an 
obligation to assign the application. 

(d) Individuals other than the attorney, agent or inventor may comply with this section by 
disclosing information to the attorney, agent, or inventor. 
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